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I. INTRODUCTION 



Risk management has become increasingly im- 
portant in financial institutions over the last 
decade. Since the publication of JP Morgan's 
RiskMetrics'^'^ (l) in the nineties, Risk Man- 
agement and Risk Control departments in 
banks have grown significantly in size and im- 
portance. The task is to fulfill regulatory re- 
quirements, to add transparency about a bank's 
risk profile by a quantitative assessment of 
risks, to develop the necessary IT-solutions 
which allow to process the huge amount of data 
of a bank, and, finally, to integrate this infor- 
mation in a risk-return (RoRAC — Return on 
Risk-Adjusted Capital) based steering process 
of the bank. Ultimately, a proper risk man- 
agement and risk control process is recognized 
by rating agencies and investors so that share- 
holder value is added to the bank. 

Banks first focused on controlling potential 
losses due to market fiuctuation, such as 
changes in the S&P 500 stock index, changes in 
interest and currency exchange rates, which is 
termed market risk. Internal market risk mod- 
els are nowadays rather matured and accepted 
by regulators for the calculation of the required 
capital to be held as buffer against such losses. 
In contrast to these elaborated statistical mod- 
els for market risks, credit risks (i.e., risks due 
to defaulted obligors) have to be covered by 
simply 8% capital of the bank's risk-weighted 
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assets. Implicitly, this charge also includes other 
risks such as operational risks. Since the New 
Basel Accord on Capital Adequacy issued by 
the Basel Committee on Banking Supervision 
in February and September 2001 |, |, |, 
known as Basel II, it is clear that regulators will 
demand banks to hold equity capital against 
operational risks explicitly. 

A common industry definition of operational 
risk (OR) is the risk of direct or indirect losses 
resulting from inadequate or failed internal pro- 
cesses, people and systems or from external 
events See for a practice-oriented in- 
troduction to the issue. Possible OR-risk cat- 
egories are (i) human processing errors, 
e.g., mishandling of software applications, re- 
ports containing incomplete information, or 
payments made to incorrect parties without re- 
covery, (ii) human decision errors, e.g., unnec- 
essary rejection of a profitable trade or wrong 
trading strategy due to incomplete informa- 
tion, (iii) (software or hardware) system errors, 
e.g., data delivery or data import is not exe- 
cuted and the software system performs calcu- 
lations and generates reports based on incom- 
plete data, (iv) process design error, e.g., work- 
flows with ambiguously defined process steps, 
(v) fraud and theft, e.g., unauthorized actions 
or credit card fraud, and (vi) external damages, 
e.g., fire or earthquake. 

Thinking of theses categories as "operational 
risk processes" it is clear that there are func- 
tionally defined dependencies between individ- 
ual processes, which all together bring a big or- 
ganization to work. Consider the following ex- 
ample for illustration: a system error leads to 
an incomplete data import into a risk calcula- 



tion engine, resulting in a wrong calculation of 
risk figures, and eventually to a human decision 
error by the trader, who closes a possibly prof- 
itable position unnecessarily to reduce a risk 
which in fact does not exist. 

In the end misleading or lagging information, 
or system and workflow failures will always re- 
sult in financial loss for a bank. Indeed, prac- 
titioners have recognized these dependencies in 
operational risk events and mandated units like 
the internal audit and risk control departments 
to control processes for the bank, and generated 
functions like a Chief Operating Officer (COO) 
to optimize them. Operational risk error trees 
between the above categories have been formal- 
ized in Q in more detail. 

Since the mid nineties financial markets have 
also attracted physicists in academia. One of 
the main reason is that financial time series ex- 
hibit several statistical peculiarities, many of 
them being common to a wide variety of dif- 
ferent markets and instruments. As such they 
could possibly be "universal", i.e., independent 
of market details like instruments, country, and 
currency, and be the signature of collective phe- 
nomena in financial markets (see |0, ^, |9[ |lO, |ll| 
and references therein). Collective phenomena 
have been widely studied in physics in the con- 
text of phase transitions. Collective phenomena 
are often responsible for insensitivity of over- 
all system behavior to details of an underlying 
dynamics. Specifically at phase transitions they 
give rise to power-law behavior, scale-invariance 
and self-similarity. Similar properties of finan- 
cial time series might therefore well be under- 
stood as a consequence of agents in a market 
acting collectively. 

Bringing together ideas from physics about col- 
lective phenomena and best industry practice 
for risk measurement the present paper details 
a possible statistical approach to determine the 
necessary equity capital to be held by banks to 
cover losses due to operational risks. In physical 
terms our model resembles a lattice gas with 
heterogeneous, functionally defined couplings. 
In such a description, bursts and avalanches of 
process failures correspond to droplet formation 
associated with a first order phase transition. 

The paper is organized as follows. In Sect. II 
we describe the Value-at-Risk concept for risk 
management and control. In Sect. Ill we de- 



scribe the approaches discussed in the context 
of Basel II for operational risk measurement. A 
new approach based on functionally dependent 
correlations giving rise to collective behavior is 
introduced in Sect. IV. Finally, Sect, ^summa- 
rizes our results. 



II. THE VALUE-AT-RISK CONCEPT 



Risk management in banks is based on diver- 
sification, hedging and equity capital as loss 
buffer. The bank charges its customers a pre- 
mium for its risks so that (expected) losses in 
one market segment are on average compen- 
sated by profits in others. Other risks, espe- 
cially market and increasingly credit risks, are 
hedged (insured) via the derivative market. Un- 
expected losses, which are not diversified or 
hedged, are covered by the bank's equity capi- 
tal. How much capital a bank needs to cover its 
risks is determined by the so-called "Value-at- 
Risk" (VaR). VaR can be defined as the worst 
loss in excess of the expected loss that can 
happen under normal market conditions over 
a specified horizon T at a specified confidence 
level q. More formally, VaR measures the short- 
fall from the g-quantile of the loss distribution 
in excess of the expected loss, EL, within the 
time period T discounted at the risk-free rate r 
to time t = 0: 



VaR,,T = iQg[L{T)]-EL)e 



-rT 



(1) 



where the g-quantile worst case loss, Qq[L{T)], 
is defined at confidence level q through 

Prob(L(r)>QJi(T)]) = l-g. (2) 

As indicated in (^ , VaR depends on the confi- 
dence level q and the risk horizon T. The choice 
of these parameters depends on the applica- 
tion. If VaR is simply used to report or compare 
risks, these parameter can be arbitrarily chosen, 
as long they are consistent. If, however, VaR is 
used as a basis for setting the amount of equity 
capital, the parameters must be chosen with ex- 
treme care: the confidence level must reflect the 
default probability of the bank within the risk 
horizon, and the risk horizon must be related 
to the liquidation period of risky assets, recov- 
ery time of ill-functioning processes, or, alter- 
natively, to the time period necessary to raise 
additional funds. This explains why regulators 
have chosen a high confidence level of 99% and a 
10-day horizon to determine the minimum cap- 
ital level for market risks. For credit risks and 
capital allocation, banks choose q and T even 
higher about 99.95% and one year, respectively. 

In the financial industry there exist established 
statistical models for market and credit risk. 
Statistical models for operational risk start now 
to be discussed in the risk management com- 
munity, especially in the context of Basel II 
HJ. Whereas internal market risk models are 
already recognized by regulators and are also 
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used in banks for capital allocation, regulators 
are much more critical about internal statisti- 
cal models for credit and operational risk. This 
is clearly less related to the mathematical com- 
plexity — although credit and operational risks 
are more difficult to model than market risks 
— but to problems with respect to input data 
which are much harder to validate than in the 
case of market risk. 



III. INDUSTRY STANDARDS FOR 
OPERATIONAL RISK MEASUREMENT 

The Basel Committee for Banking Supervision 
has proposed three alternative approaches to 
operational risk measurement [Q: The "Basic- 
Indicator Approach (BIA)" , the "Standardized 
Approach (SA)" , and the "Advanced Measure- 
ment Approach (AMA)". In the BIA the re- 
quired capital for operational risk is deter- 
mined by multiplying a single financial indica- 
tor, which is gross income (interest, provision, 
trading, and other income) by a fixed percent- 
age (called the a-factor). The SA differs from 
the latter in that banks are allowed to choose 
business line specific weight factors, /3fe, for the 
gross income indicator, Ik, of the k*'^ business 
line. The total regulatory capital charge, RC, 
is the simple sum of the capital required per 
business line, 

i?C = ^/3feX/fe. (3) 

k 

The weight factors a and Pk are calibrated such 
that the required regulatory capital for opera- 
tional risk would be 17 - 20% of the current 
regulatory capital on bank average standards. 

The AMA consist of three sub-categories: The 
"Internal Measurement Approach (IMA)", the 
"Loss Distribution Approach (LDA)" and the 
"Scorecard Approach (SCA)". It is a more ad- 
vanced approach as it allows banks to use ex- 
ternal and internal loss data as well as internal 
expertise. 

In the IMA the required capital is calculated 
as the sum over multiples of the expected loss 
per OR-risk category/business line cell 

RC^J2 X EL,t , (4) 

i,k 

where i is the risk category and k the business 
line. The expected loss is quantified as the prod- 
uct of the annual OR-event probability, an ex- 
posure indicator per business line and risk cate- 
gory, and the loss percentage per exposure. All 



parameter estimates have to be disclosed to the 
supervisors. Since the 7-factor is computed on 
an industry based distribution, it will be possi- 
ble to adjust the capital charge by a risk profile 
index, which accounts for the bank's specific 
risk profile compared to industry. 

The LDA and SCA are very similar as both ap- 
proaches are based on a statistical VaR-model. 
Details of the LDA approach are outlined in 
ref. Q. In both approaches the bank estimates 
for each risk category/business line cell the 
probability distributions of the annual event 
frequency and the loss severity (— exposure x 
loss fraction per exposure). The difference be- 
tween the LDA and the SCA is that in the 
former only internal or external historical loss 
data are used for estimating the distribution 
functions. In addition to this, banks are also 
allowed to apply expert knowledge to estimate 
the distribution functions in the SCA. This is 
a forward looking approach. It is particularly 
suited for operational risk, as processes that 
have failed are usually changed; hence histori- 
cal loss data could provide potentially mislead- 
ing information. Even if banks have an exhaus- 
tive internal database of losses, it can hardly be 
considered as representative of extreme losses. 
Hence, expert assessments and external loss 
databases are necessary. The problem with the 
latter data source is that external historical 
losses must be scaled to fit the balance sheet 
of the bank (it must be possible that the losses 
can occur in the bank). 

Popular choices for the loss severity distri- 
bution functions are the lognormal, Gamma, 
Beta, WeibuU distribution. Common choices for 
the loss frequency distribution function are the 
Poisson or negative binomial distribution. In a 
top-down approach different OR-risk categories 
are assumed to be independent. For each busi- 
ness unit and for each OR-category the OR- 
loss is simulated in a Monte-Carlo simulation 
by drawing a realization Nik from the loss fre- 
quency and sampling Nik realizations of the loss 
severity (m = 1, . . . , Nik). The loss in such 
a sample is 

Lrk = J2 ^Tk ■ (5) 

m— 1 

Drawing a histogram of outcomes of Lik pro- 
vides the loss distribution function per risk 
category /business line cell. The Valuc-at-Risk 
is read off from the tail in excess to the ex- 
pected loss as described in Sect. II. Due to 
the assumption of statistical independence, the 
loss distribution can also be calculated analyt- 
ically as the convolution product of the loss 
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frequency and the loss severity distribution. 
The required capital for the bank as a whole 
can either be calculated as the simple sum- 
mation of the capital charges across each of 
the risk category/business line cell. This is the 
method given by the Basel Committee on Bank- 
ing Supervision in the Internal Measurement 
Approach. Or, the MC-sampling can be ex- 
tended beyond the risk category/business line 
cell by L = ^. L^fc, which takes diversification 
between the risk category/business line cell into 
account. 

A critical point which concerns all presently 
discussed approaches is the correlation between 
OR-losses. In this paper our focus will be how 
correlations and dependencies between OR-risk 
events can be integrated in the LDA/SCA. 



IV. FUNCTIONAL CORRELATION 
APPROACH FOR OPERATIONAL RISK 

Since Markowitz's centennial work on portfo- 
lio theory p^ , p^ , diversification and depen- 
dencies between risk events are modeled by the 
covariance of stochastic processes. Because em- 
pirically only the mean and the covariance of 
these processes are reliably determined from 
market data, it is common practice to choose 
correlated Gaussian white noise for modeling 
correlations. As a consequence the loss distri- 
bution is unimodal with frequent small losses 
and a few extreme losses, which — dependent 
on the distribution of the loss severity — are 
responsible for fat tails in the loss distribution. 
Collective losses or even crashes such as burst 
and avalanches of losses are not contained in 
this description. 

A main point in this paper is that this stochas- 
tic dependency of risk events is not suffi- 
cient for all risk categories: one frequently also 
observes direct, functional and non- stochastic 
dependencies. Functional dependency between 
risk events is most pronounced in operational 
risk events. Processes in a (large) organization 
are usually organized so as to mutually sup- 
port each other. Thus, if a process fails, this 
will usually be detrimental to other processes 
relying on receiving input or support of some 
sort from the failing process in question, so that 
they run a higher risk of failing as well. It there- 
fore seems inadequate to model operational risk 
events individually per risk category/business 
line cell and aggregate losses afterwards over 
some covariance matrix, which would be the 
choice when approaching operational risks anal- 



ogously to market risks. In the following we ex- 
tend the LDA/SCA by taking the functional 
dependencies between processes into account. 

We consider a simple two state model here, 
i.e., a processes can be either up and running 
or down. For the process corresponding to the 
OR-event i we designate these states as = 
and Ui = 1, respectively. In following we will 
skip the business line index k for simplicity. 

The interest is in obtaining reliable estimates 
of the statistics of processes that are down at 
any time and of the statistics of losses incurred 
at any time. As the loss severity incurred by 
a given process going into the down state may 
vary randomly from event to event, solving the 
latter problem requires convolving the statistics 
of down-events with the loss severity distribu- 
tion related to the process failures. 

The reliability of individual processes will vary 
(randomly) across the set of processes, and so 
will the degree of functional interdependence. 
These random heterogeneities constitute an el- 
ement of quenched disorder, whereas the loss 
severities incurred by down processes consti- 
tute an element of annealed disorder as they are 
(randomly) determined anew from their distri- 
bution each time a process goes down. An ap- 
pealing feature for the modeler of operational 
risks therefore is the independence of the dy- 
namic model of the interacting processes and 
the loss severity model (i.e. the estimate of the 
PDFs of loss severity incurred by individual 
process failures.) A typical assumption for the 
latter is to take them as being distributed ac- 
cording to a log-normal distribution with suit- 
able parameters for means and variances, which 
we will choose in the following. 



A. Dynamics 

To motivate the dynamics of the functional 
approach, note that all processes need a cer- 
tain amount of "fueling" or support in order 
to maintain a functioning state for the time in- 
crement t t + At within the risk horizon, 
t € [0, T) (think of human resources, informa- 
tion, input from other processes, etc.). Here, 
only the generic features of the model shall be 
outlined. Hence, the increment At is chosen 
such that all processes can fully recover within 
this time interval, i.e., the state Ui of each pro- 
cess can flip each time step. For practical appli- 
cations in banks, one would model the recovery 
process more carefully: specific death-period af- 
ter the failure of the i*"^ process would be con- 
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sidered, and one would differentiate between 
process failures being discovered and adjusted 
up to a certain cut-off time, e.g., end-of-day, 
at which a process would have been completed 
1^ . These features are not generic and can only 
be discussed related a specific OR-event under 
consideration. 

We denote by hi{t) the total support received 
by process i at time t, and choose it to take the 
form 



(6) 



That is, it is composed of (i) the average to- 
tal support 'di that would be provided by a 
fully operational network of processes (in which 
ni{t) = for all i). This quantity is (ii) di- 
minished by support that is missing because of 
failing processes which normally feed into the 
process in question; (iii) lastly, there are fluc- 
tuations about the average which we take to 
be correlated Gaussian white noise with — by 
proper renormalizing di and wij — zero mean 
and unit variance. Correlated Gaussian noise is 
introduced to model equal-time cross correla- 
tions between OR-risk categories in analogy to 
the approach proposed by the Basel Committee 
for Banking Supervision for credit risk, 



(7) 



where Y{t) ~ A/'(0, 1) is a common factor for 
all OR-risk categories with equal-time correla- 
tion coefficient p, and the ei{t) ~ A/'(0, 1) are 
idiosyncratic terms. 

Note that non-linear effects could be included 
by modifying (|) to hi{t) = - " 
k '^ijkTij{t)nk{t) — . . .-\-rii(t). Note also that, 
as in credit risk modeling, the common fac- 
tor y^y(t) could further be decomposed into 
sector-contributions ^Y{t) J2k f^ikYk{t) so 
as to describe more complicated equal-time cor- 
relations. To keep this treatment transparent, 
we will present the formalism without these ex- 
tensions. 

Process i will fail in the next time instant 
t + At, if the total support for it falls below 
a critical threshold. By properly renormalizing 
we can choose this threshold to be zero, thus 
{Q is the step-function: G(a;) = 1 for a; > and 
else) 



,(i + At) = e(^-i$li + ^Wynj(i) 

j 

-VpY{t) - y^T~peS)) (8) 



The losses incurred by process i are then up- 
dated according to 

L,{t + At) = L,{t) +n,{t + At)Xl+^t , (9) 

where Xl^^^ is randomly sampled from the loss 
severity distribution for process i. Note that 
the process dynamics is independent of assump- 
tions concerning their loss severity distributions 
within the present model. 

One can integrate over the distribution of id- 
iosyncratic noises to obtain the conditional 
probability for failure of process i given a con- 
figuration n{t) = {ni{t)} of down-processes and 
a realization of the common factor Y{t) at time 
t. 



{ni{t + At))n{t),Y{t) 

= Prob (n,{t + At) 
= $ 



n{t),Y{t)] (10) 



Here $(x) denotes the cumulative normal 
distribution. Note that we have set i)i = 
— $^^(pi), where Pi,At = Pi is the uncondi- 
tional expected probability for process failures 
within the time-increment At. This is consis- 
tently justified by setting nj (t) = for all j and 
p = in Eq. (p^. Note that up to the functional 
term ^ij ''^jW this approach corresponds 
to the approach adapted by the Basel Commit- 
tee for Banking Supervision for credit risk |4| . 

The couplings Wij can be determined by con- 
sidering the transition probabilities, Pij,At = 
Pij, for process i failure within the time- 
increments At, given that in the configuration 
at time t process j is down, whereas all other 
processes are running, and Y{t) = 0. Introduc- 
ing the shorthand Cj for this configuration, we 
can write 



Pzj 



Prob {ni{t + At) = 1 Cj 



<^-\p^ 



This leads to 



= ^r~p^-\p,j)-<f-\p,) 



(11) 



(12) 



Analogous identities would be available for de- 
termining higher order connections Wijk, if non- 
linear effects were taken into account. Note that 
the probabilities for process failure depend only 
on the increment At and not on the time t due 
to the stationarity of the dynamics. 
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To illustrate how these parameter are fixed in 
practice, consider the following. Either from a 
historical loss database, or from an expert as- 
sessment the following two questions must be 
answered per OR-risk category and business 
line: 

1. What is the expected period, (r^), until 
process i fails for the first time in a fully 
operative environment, and 

2. given that only process j has failed, what 
is the expected period, (ry ), for process i 
to fail also? 

Noting that with Prob(failure at zAt) (l — 
PiY ^Pi one finds that 

{n)=Y,zAt{l-p,y-'p,^ — , (13) 

and analogously, 

°° Af 
(r.,)=^zAt(l-p,,)^-V, = — . (14) 

These identities express the pi and pij in terms 
of estimated average times of failure, and are 
used to fix the model parameters completely. 
Note that according to (|ri|) pij can be inter- 
preted as a non-equal time correlation for pro- 
cess failures. 

Note also that, incidentally, the dynamics (||) 
resembles that of a lattice gas (defined on a 
graph rather than on a lattice) , the denoting 
occupancy of a vertex, the Wij interactions, and 
•di taking the role of chemical potentials regu- 
lating a-priori occupancy of individual vertices. 
The present system is heterogeneous in that (i) 
the ??i vary from site to site, (ii) the couplings 
Wij have a functional rather than regular geo- 
metric dependence on the indices i and j des- 
ignating the vertices of the graph. Moreover, in 
the physics context, one usually assumes noise 
sources other than Gaussian so that cumulative 
probabilities are described by Fermi-functions 
rather than cumulative normal distributions as 
above. The quantitative difference is minute, 
however. 

The model dynamics as such cannot be solved 
analytically for a general heterogeneous net- 
work. We shall resort to Monte-Carlo simula- 
tions to study its salient properties. The main 
qualitative features can, however, also be ob- 
served in the simplified situation of a homoge- 
neous network consisting of identical processes, 
having the same connectivity at each node. A 



mean-field analysis of such a simplified situa- 
tion will be given as well. As the presence of 
the common factor expressed by the p-term in 
Eq. (^ would influence only quantitative de- 
tails of the system's behavior, we will further 
present the analysis without correlation to the 
common factor by setting p = 0. 



B. Key Features 

Key features of the collective behavior of net- 
works of interacting processes can easily be an- 
ticipated either directly from a discussion of the 
dynamic rules, or from the analogy with the 
physics of lattice gasses. 

Consider a network in which the unconditional 
probabilities for process failures, pi, are small, 
but process interdependence is large and con- 
sequently conditional probabilities for process 
failures, pij, are sizeable. In such a situation, 
spontaneous failure of individual processes may 
induce subsequent failures of other processes 
with sufficiently high probability so as to trig- 
ger a breakdown of the whole network. If, on 
the other hand, process interdependence re- 
mains below a critical threshold value, indi- 
vidual spontaneous failures will not have such 
drastic consequences, and the whole network 
will remain in a stable overall functioning state. 

Of particular interest for the risk manager is 
the case, in which process interdependence is 
low enough to make a self-generated break-down 
of the network extremely unlikely, but parame- 
ters are nevertheless such that a stable over- 
all functioning state of the network coexists 
with a phase in which nearly the complete net- 
work is in the down state (two phase coexis- 
tence). In such a situation, it may be exter- 
nal strain which can induce a transition from 
a stably functional situation to overall break- 
down. Analogous mechanisms are believed to be 
responsible for occasional catastrophic break- 
downs in bistable ecosystems . 

With increasing unconditional probabilities for 
process failures it becomes meaningless to dis- 
tinguish between an overall functioning, and 
a non-functioning phase of the network, two- 
phase coexistence ceases to exist — as in (lat- 
tice) gasses — at a critical point. 



C. Simulations 

In the following we validate some of our in- 
tuitions about global network behavior using 
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Monte-Carlo simulations. 

The Monte-Carlo dynamics can either be con- 
ceived as parallel dynamics (all rii are at each 
time step simultaneously updated according to 
or (]lO|)), or as (random) sequential dynam- 
ics (only a single Ui is (randomly) selected for 
update according to (^) in any given time-step, 
in which case the time increment must scale 
with the number TV of processes in the net as 
Ai-7V-i). 

For the analysis of operational risks, losses are 
accumulated during a Monte-Carlo simulation 
of the process dynamics over the risk horizon, 
T. Runs over many risk horizons then allow to 
measure loss distribution functions for individ- 
ual processes within the network of interacting 
processes, or of business units or the full net- 
work by appropriate summations. 

For the simulations, we choose a random set- 
ting, i.e., unconditional failure probabilities are 
taken to be homogeneously distributed in the 
interval [0,p™'''^] and we determine random con- 
ditional failure probabilities as pij — pi{l+eij), 
with Eij homogeneously distributed in [0, e™^'^], 
which fixes the ratio {pij/piY^^^- 

Fig. 1 shows a situation where a functional net- 
work coexists with a situation in which the net- 
work is completely down, and parameters are 
such that spontaneous transitions between the 
phases are not observed during a simulation. 
The upper track shows the loss record of a sys- 
tem initialized in the "all down" state whereas 
the lower track exhibits the loss record of the 
same network initialized in the fully function- 
ing state. 

The loss distribution for the Junctional net- 
work is unimodal with a bulk of small losses and 
a fat tail of extreme losses, which are driven by 
the loss severity distribution. 

By increasing the functional interdependence 
at unaltered unconditional failure probabilities, 
the functioning state of the network becomes 
unstable. A spontaneous transition into the 
'down' state is observed during a single run of 
50000 Monte-Carlo steps. Two interesting fea- 
tures about this transition to complete break- 
down deserve mention: (i) the time to break- 
down can vary within very wide limits (we have 
not attempted to measure the distribution of 
times to breakdown and its evolution with sys- 
tem parameters such as range of values for con- 
ditional failure probabilities), (ii) there are no 
detectable precursors to the transition; it oc- 
curs due to large spontaneous fluctuations car- 
rying the system over a barrier, in analogy to 



droplet formation associated with first order 
phase transitions. 

We should like to emphasize that realistically 
the system dynamics after an overall break- 
down of a process network would no longer be 
the spontaneous internal network dynamics: re- 
covery efforts would be started, increasing sup- 
port for each process by a sufficient amount 
such as to reinitialize the network in working 
order. 

Repeated spontaneous transitions in both di- 
rections can be observed only in a rather small 
network (Fig. 2). The corresponding loss distri- 
bution will be bimodal. 

Fig. 3 illustrates the principle of a strain- 
simulation. In each case, the system, if in the 
operational state, is repeatedly put under ex- 
ternal strain by turning off 5 randomly selected 
functioning processes every 1000*'' time step, 
and letting the system evolve under its internal 
dynamics thereafter. Such a disturbance can ei- 
ther trigger a breakdown of the system or not. 
In the former case, if the system is found fully 
down 1000 time steps later, it is reinitialized in 
the fully operational state and once more dis- 
turbed 1000 steps later. 

One observes that the operational low-loss 
phase which we have seen in Fig. 1 to co- 
exist with the non-operational phase is re- 
silient against disturbances of the kind de- 
scribed above. The same is true if [pij/piY^^^ 
is increased to 2.7. At {pij / Pi)""^"^ = 2.8 an ex- 
ternal strain succeeds once during the simula- 
tion to trigger breakdown of the net, whereas 
at [pij/pi)™^^ = 2.9 breakdown under exter- 
nal strain of the given strength is the regular 
response of the system (with a few exceptions 
and occasional spontaneous recoveries). 

Of interest to the risk manager in the end are 
the total losses accumulated over a risk horizon, 
T, 

i(T) = ^L,(r), (15) 

i 

more specifically, the corresponding probabil- 
ity density function. Fig. 4 presents such dis- 
tribution of accumulated losses for a network 
that remains operational throughout the sim- 
ulation. For this simulation we have chosen 
T = 365At. The loss distribution has an ex- 
tended tail (barely visible on the scale of the 
data, with a 99.5% quantile at 1400, and the 
largest aggregated loss observed during the sim- 
ulation over a time span of T at 3400, i.e. by a 
factor of more than 3 larger than the expected 
loss for the chosen risk horizon T. 
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FIG. 1: Loss record for a system of A'^ = 50 inter- 
acting processes with (first panel) p™''^ = 0.02 and 
(Pij/Pi)™''^ = 2.6. The low-loss situation coexists 
with a high-loss situation; although a spontaneous 
total breakdown of the operational system into the 
non-operational high-loss phase does not occur dur- 
ing the simulation, external influences may well in- 
duce such a transition. The second panel has the 
same p™"" but ipij/piT'^ = 3. The low-loss sit- 
uation is unstable and spontaneously decays to a 
high-loss situation via a bubble-nucleation process. 




20000 30000 
time 



FIG. 2: In a small system (A'' = 10), repeated 
changes between high- and low-loss situations can 
be observed. 
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FIG. 3: Strain simulations described in the main 
text. The parameters are p™*^ = 0.02 as in the 
previous figures and (pij/pi)™*™ = 2.6, 2.7, 2.8, and 
2.9. 
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equation (without correlation to the common 
factor, p = 0) 



1000 2000 3000 4000 



FIG. 4: Frequency distribution of aggregated num- 
ber of down-processes during a risk horizon of 
T = 365 At (first); loss distribution (second panel). 
Total time covered was 10*r. Histograms are not 
normalized. Parameters of the system are A'^ = 50, 
^max ^ Q (Pij/Pi)™'"' = 2.5, loss Severity 

distributions are taken as log-normal, with means 
randomly spread over an interval [0,10] and volatil- 
ities chosen randomly as a factor of their respective 
means, the maximum factor being 0.4. 



D. Mean-field Solution 

The principle dynamic properties of the model 
exist independently of its heterogeneous na- 
ture. To exhibit systematic relations between 
dynamic features and model parameters, we 
shall elucidate them in the simplified setting of 
a homogeneous network of identical processes, 
which we analyze within a mean-field approxi- 
mation. 

For this we assume a homogeneous coupling, 
Wij — > wq/z and pij — > p^, where z is the 
coordination number of the graph (taken to 
be identical at each vertex), and replace time- 
dependent quantities in Eq. (|l^) by long-time 
stationary averages, nj(i) — > {nj{t)). In a ho- 
mogeneous system, averages are independent of 
the process index i and time, such that pi — > p 
and (rijit)) — > n. This gives the mean-field 



n = $ ($ ^{p)+wan) 



(16) 



Depending on p and the average coupling 
strength, wq, this equation has either one 
unique solution or three solutions, with one un- 
stable solution at intermediate n, and two sta- 
ble solutions, 71 « or 1. Figure 5 shows stable 
and unstable solutions as functions of wq and 
Pw for a given value of p. 

The phase diagram (Fig. 6) summarizes re- 
gions in the p-wo plane, where operational and 
non-operational phases of the network coexist 
showing limits of stability of the low-loss and 
high-loss solution. For p exceeding a critical 
value Pc — 0.0218 there is a unique disordered 
phase with relatively large values of n. 

For a sufficiently small unconditional failure 
probability p an initially running process net- 
work will for weak functional dependence re- 
main in the running state, despite of sponta- 
neous individual process failures. Such a net- 
work is in a functioning state. 

For stronger functional correlation the func- 
tional state of the network becomes unstable. 
Fluctuations in the number of processes that 
are down at any given time can trigger a burst 
or avalanche of failures — a collective phe- 
nomenon corresponding to droplet formation 
associated with a first order phase transition. 

For intermediate degrees of functional depen- 
dence, the network allows for two (meta)stable 
states, an operational and a non-operational 
network. While spontaneous fluctuations in the 
number of failing processes will in large net- 
works fail to trigger transitions between the two 
states, external strain may well do, as demon- 
strated in the strain simulations above. Seen 
from the functioning side, the closer the param- 
eters are to an instability line, the smaller the 
strain needed to trigger avalanches of failures 
leading to the fully defunct state of the system. 
This behavior is quantified by the susceptibil- 
ity X (Fig. 7). With support for each process 
decreasing by an amount Sh < 0, the fraction 
of down processes will change by 



with 

x{wo,p) 



6n ~ x(wo,p) Sh 



1 - wo^'{^-^(j)) + Won)) 



(17) 

, (18) 



where $'(a;) = exp[-a;^/2]. Note that the 
susceptibility is proportional to the sensitivity 
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FIG. 5: Mean-field solution for the average fraction 
n of down-processes for p ~ 0.02 as a function of the 
coupling parameter. The back-turning part is the 
unstable solution nu{wo). For n > Uu the system is 
driven towards the non-operational high-n solution, 
as long as n < n„, the system is driven back to the 
operational low-n solution. In the second panel wo 
is translated into a conditional probability Pw under 
the assumption that the coordination is z = 4 (right 
curve) and z = 8 (left curve). 



of the fraction of down-processes with respect 
to the unconditional probability of process fail- 
ure. 

Triggering complete failures becomes easier 
close to instability lines for two reasons: (i) 
the susceptibility diverges as instability lines 
are approached; (ii) the unstable solution n„ is 
closer to the stable equilibrium solution; exter- 
nal strain only needs to push the system beyond 
n„ to destabilize the functioning state. 

Such a regime could therefore be dangerous for 
a network of mutually supporting processes in 
a bank. Due to the long periods in one of the 
mctastablc states, the bank would not neces- 
sarily realize the potential of big losses due to 
bursts and avalanches of process failures. With 
a basically unchanged process setup the net- 
work could collapse and cause significant losses, 



10"' 10"" 10"' 

p 

FIG. 6: Mean-field phase diagram for the homoge- 
neous interacting processes system. At low p opera^ 
tional and non-operational process networks coex- 
ist, separated by a discontinuous phase transition. 
Shown are couplings Wq corresponding to spinodals 
which mark instabilities of fully operational low- 
loss and non-operational high-loss situations (up- 
per and lower curves, respectively). Note that hys- 
teresis effects are implied. The spinodals merge in 
a critical endpoint, where the transition is second 
order. 
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FIG. 7: Susceptibility of the mean-field solution for 
p ~ 0.02 as a function of for the homogeneous 
interacting processes system aX z = 8. The suscep- 
tibility diverges as the spinodal is approached. 

either due to external strain or rare fluctuations 
of internal dynamics. Owing to the stability of 
the metastable states, the bank will then have 
to spend a lot of efforts in order to bring the 
network back to a functional state, which will 
cause additional costs. 



V. CONCLUSION 

In this paper we have outlined how ideas from 
physics of collective phenomena and phase tran- 
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sitions can naturally be applied to model build- 
ing for operational risk in financial institutions. 
Our main point was that functional correlations 
between mutually supportive processes give rise 
to non-trivial temporal correlation, which could 
eventually lead to the collective occurrence of 
risk event in form of burst, avalanches and 
crashes. For risks associated to process fail- 
ure (operational risks) a functional dependence 
seems to be the appropriate way for modeling 
sequential correlations. 

From the physics point of view, the appropri- 
ate model is rather simple, being a heteroge- 
neous variant of the well studied lattice gas 
model. Despite the heterogeneities, it has a first 
order phase transition (driven by average in- 
teraction strength) at sufficiently low average 
a-priori probability of process failures, show- 
ing coexistence between an overall functioning 
state (gas) and a state of catastrophic break- 
down (liquid). As the a-priori probability of 
process failures is increased the first order tran- 
sition ends at a (liquid/gas) critical point. 

One of the most critical lessons for Risk Con- 
trol from our analysis is the possible metasta- 
bility of networks of interacting processes: The 
bank would not necessarily realize the poten- 
tial of big losses due to bursts and avalanches 
of process failures, as there are no detectable 
precursors to such transitions. With a basically 



unchanged process setup the network could col- 
lapse and cause significant losses, either due 
to external strain or rare fluctuations of in- 
ternal dynamics. Owing to the stability of the 
metastable states, the bank will then have to 
spend a lot of efforts in order to bring the 
network back to a functional state, which will 
cause additional costs. To assess the metasta- 
bility banks have to perform stress tests. 

It should be noted that realistically the sys- 
tem dynamics after an overall break-down of a 
process network would no longer be the sponta- 
neous internal network dynamics: recovery ef- 
forts would be started, increasing support for 
each process by a sufficient amount such as to 
reinitialize the network in working order. 

In a forthcoming publication we will also show 
that the random walk model for financial time 
series commonly used in banks can naturally 
be extended to incorporate functional depen- 
dencies leading to collective effects. This will 
lead to models which, while bearing some re- 
semblance with agent-based models of markets 
(see, e.g., [[l5[ ) are different from them in other 
respects |1(]| ]. 
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